I’ve been alsways interested in statistics and sociology of crime. Questions like ‘Is there a connection between crime and poverty?’, ‘Does level of educaiton correlates with crime rate?’ or ‘Are factors correlated with property and personal crimes different or not?’ have been dragging my attention for a long time.
This paper is an attempt to find which demographic factors correlate with level of crime using techniques from Udacity’s course ‘Exploratory data analysis with R’. I’ll try to select demographic factors that might correlate with crime rate and cover the following broad topics:
My starting point was to figure out how to get as granular dataset as possible both for crime and for demographic data. As it turns out, several major cities in US provide a dataset of every reported crime, with crime description and, which is even more important, geographical locations of reported crime. I obtained these datasets for two large US cities: Los Angeles and Chicago for the year 2013.
Primary source of varous demographic data for the United States is The United States Census Bureau. It provides demographic data at various geographic levels through its site. The smallest geographical unit for which the bureau publishes sample data is census block group. Additionally, bureau provides geographical shapes of these block groups. This allows us to determine block group of every reported crime, since our initial crime dataset contains information about geographical coordinates of the crime.
Next step was to determine which demographic factors I should select for this paper. There is plenty of research around factors influencing crime. Most of the research describing factors influencing crime, cite these demographic parameters as the ones that have the most incluence to crime rate:
Within the current paper I decided to focus on following variables:
All this data is available for download on census block group level through census.gov.
Creating clean dataset for this paper was a long and time consuming task. It contained multiple steps, involving multiple technologies, like Excel, GIS, R and SQL. I used SQL for loading and merging data, and geographical extentions for Postgres database named PostGIS for geographical calculations. I also used Excel and R for cleaning up downloaded datasets. The process is described in detail in a separate document.
As part of cleaning up dataset, I had to assign manually if the crime was personal or property crime. Original crime datasets contained a column “crime type”, however these were more granular crime types. More over, each police departments has its own set of crime types. In my case this column contained rather obvious types, such as ‘Arson’, ‘Assault’, ‘Homicide’ or ‘Theft’, as well as more obscure or granular types, such as ‘OTHER MISCELLANEOUS CRIME’ or ‘THEFT, COIN MACHINE’. I constructed a separate dataset that matches between reported crime type form dataset, and crime types used in this research: ‘personal’, ‘property’, ‘other’.
At the end I constructed two datasets about crime and demographic statistics in Los Angeles and Chicago:
In this section I will look at two datasets in more detail, and will try to provide a deeper overview of crime situation in both cities. Let’s start with exploring basic descriptive statistics of crime reports dataset.
There are 535912 crime reports in our dataset, 304,372 or 56.8% of our dataset are reported in Chicago and 231,540 or 43.2% were reported in Los Angeles. Crime rates (number of reproted crimes per 100,000) are given in table below:
| personal | property | total | |
|---|---|---|---|
| Chicago | 4,169 | 5,834 | 11,248 |
| Los Angeles | 2,558 | 3,356 | 6,105 |
And this is the breakdown between crime types and cities in absolute numbers:
| personal | property | other | |
|---|---|---|---|
| Chicago | 112,830 | 157,875 | 33,667 |
| Los Angeles | 97,020 | 127,297 | 7,223 |
Chicago has more crime reported in 2013 than Los Angeles, both in absolute and expecially in relative numbers. Difference in crime rate is especially high for personal crimes: Chicago’s rate of personal crimes is around 60% higher than Los Angeles’. Most of the crimes were property crimes in both cities.
Let’s look at the time when crimes are happening. Hourly patterns by crime type and city are plotted below.
Some patterns are hte same across the cities and crime types. For example, lowest crime rate is during early morning hours between 4 and 6 AM. Also it is surprizing that crime reports tend to be reported more at odd hours, as we see from jagged lines in all the facets.
However crime in Chicago and Los Angeles are different in several ways. As we saw in table above, personal crimes in Chicago have larger share than in Los Angeles. Another difference is that property crimes tend to be happening at different times in these two cities: maximum share of reported property crimes in Chicago is at 9AM, while in Los Angeles maximum is at noon.
Let’s look at weekday patterns.
Here we see slightly different crime patterns between two cities. Most crimes in Chicago are reported at the begininng of the week, while in Los Angeles they tend to be happening in the middle of the week.
Finally let’s plot crime reports on map of respected cities.
This is the most interesting plot so far. Both personal and property crimes in Los Angeles are concentrated in a single area around Skid Row and Downton Los Angeles. In Chicago though both types of crime are concentrated in completely different areas. Property crimes are clustered around Chicago city center: Near North Side, Chicago Loop and River North. Personal crimes are concentrated heavily around western areas of Chicago: North and South Lawndale, Near West Side.
Let’s start with plotting our demographic variables on a city map, to find visual clues about connection between demography and crime level. This is map of Los Angeles with four demographic variables on it:
## OGR data source with driver: ESRI Shapefile
## Source: "raw/tl_2013_06_bg", layer: "tl_2013_06_bg"
## with 23212 features
## It has 12 fields
Same variables for Chicago look like this:
## OGR data source with driver: ESRI Shapefile
## Source: "raw/tl_2013_17_bg/", layer: "tl_2013_17_bg"
## with 9691 features
## It has 12 fields
There is some pattern between median income and education level from one side and crime level on another side. In both cities median income is lower in the areas where personal crime rate is higher, and education level is also lower in the areas with high level of personal crime. However there is no clearly visible pattern for density and unemployment level.
Let’s move to our main research question, namely finding of there is any correlation between four selected demographic variables.
Scatteplots of number of various types of crime and demograpfic parameters by block group are plotted below: